The impact of data quality monitoring of a multicenter prospective registry of cardiac implantable electronic devices

Data quality monitoring plays a crucial role in multicenter prospective registries. By maintaining high data accuracy, completeness, and consistency, researchers can improve the overall quality and reliability of the registry data, enabling meaningful conclusions and supporting evidence-based decisions. The purpose of the present study was to evaluate data quality metrics (completeness, accuracy, and temporal plausibility) of a Multicenter Registry of Cardiac Implantable Electronic Devices (CIEDs) and to perform a direct data audit of a random sample of records to assess the agreement levels with the source documents. The CIED Registry was a prospective, multicenter, real-world observational study carried out from January 2020 to December 2022 in five designated centers across Sao Paulo, Brazil. We assessed the data quality of the CIED Registry by using two distinct approaches:• Dynamic data monitoring using features of the REDCap (Research Electronic Data Capture) software, including data reports and data quality rules• Direct data audit in which information from a random sample of 10 % of cases from the coordinating center was compared with original source documents Our findings suggest that the methodological approach applied to the CIED Registry resulted in high data completeness, accuracy, temporal plausibility, and excellent agreement levels with the source documents.

• Dynamic data monitoring using features of the REDCap (Research Electronic Data Capture) software, including data reports and data quality rules • Direct data audit in which information from a random sample of 10 % of cases from the coordinating center was compared with original source documents Our findings suggest that the methodological approach applied to the CIED Registry resulted in high data completeness, accuracy, temporal plausibility, and excellent agreement levels with the source documents.

Subject area:
Medicine and Dentistry More specific subject area: Cardiac Device Registry Name of your method: Data quality monitoring of multicenter registries Name and reference of original method: NA Resource availability: NA

Context and significance
Prospective registries with real-world data are a valuable source of clinical evidence that is of significant importance in decisionmaking, improving the quality of healthcare, formulating evidence-based healthcare policy, and patient safety surveillance [1] .Despite the large volume of publications on clinical registries in the field of Cardiology [2 , 3] , registries of cardiac implantable electronic devices (CIEDs) such as pacemakers, implantable cardioverter-defibrillators (ICDs) and cardiac resynchronization therapy (CRT) are still uncommon in the scientific literature [4 , 5] .
There are many challenges in conducting prospective multicenter registries.The main thing is to have a single database for all centers which allows access to different user profiles in a secure way and enables monitoring data quality in real time [6 , 7] .In this sense, adopting digital tools for electronic data collection and management can help overcome these logistical and technological challenges, bringing greater efficiency to all the study life cycle [8] .
Evaluating data quality metrics from prospective registries has gained great prominence in the literature in recent years [9 , 10] .Although there is no consensus on the main metrics which should be adopted, the most cited indicators include completeness, accuracy, and temporal plausibility [11][12][13] .At the same time, direct data auditing, which involves comparing collected data with primary information sources, has been adopted as an additional strategy to ensure the quality of prospective clinical registries [14 , 15] .
The hypothesis of the present study is that adopting strategies aimed at monitoring data quality can minimize the error rates and inconsistencies inherent in prospective multicenter registries.Thus, based on implementing infrastructure for data collection and management applied to a Multicenter and Prospective CIED Registry, the purpose of the present study was to evaluate the most common data quality metrics (completeness, accuracy, temporal plausibility) and to perform a direct data audit of a random sample of records to assess the agreement levels with the source documents.

Study design
This is a methodological study designed to evaluate data quality metrics from a Prospective Multicenter CIED Registry.

Prospective multicenter registry of cardiac implantable electronic devices
The CIED Registry was a prospective, multicenter, real-world observational study carried out from January 2020 to December 2022 in five designated centers across the state of Sao Paulo, Brazil.The study was coordinated by a referral center for Cardiac Pacing and Electrophysiology located in the city of Sao Paulo, Brazil.All individuals who underwent initial CIED implantation or reoperations for maintenance and/or treatment of complications related to the cardiac device were consecutively included in the CIED Registry.
Data derived from electronic health records and hospital administrative systems were collected at three different times: index hospital admission, at 30 and 180 days after discharge.Demographic, clinical, surgical, and discharge data were also collected at hospital admission.In addition to collecting clinical follow-up data that evaluated complications and readmissions which occurred in the postoperative phase (30 and 180 days after discharge), patient-reported outcome (PRO) measures were also obtained using standardized instruments in a convenience sample.

Infrastructure of the prospective multicenter registry of cardiac implantable electronic devices
The data collection and management infrastructure of the CIED Registry included the following steps: (1) Data management plan; (2) Defining data element terminology; (3) Developing electronic case report forms (e-CRF) using REDCap (Research Electronic Data Capture); (4) Customization of specific functionalities of REDCap; (5) Research team training; (6) Dynamic monitoring of data quality with REDCap functionalities; and (7) Direct data auditing [16] .
Developing e-CRF in REDCap requires careful planning and adherence to good practices to ensure the overall quality of the research protocol.In this study, the format and terminology of required data elements were defined in accordance with international standards, as previously published [16] , comprising a comprehensive data dictionary that encompassed all data points to be collected during the patient's journey.Related data elements were logically grouped into specific forms (demographic, clinical, surgical, discharge, follow-up) to facilitate efficient data entry.The use of appropriate field types (text, drop-down list, radio buttons, and checkboxes) was crucial to optimize data entry, and most importantly, to ensure that exported data formats met the requirements of downstream analyses.Additional resources, such as numeric field validation, branching logics and automatic calculated fields were adopted in several database forms, aiming to minimize potential typing errors and facilitate data quality monitoring.
The database structure consisted of 12 electronic forms, making a total of 291 variables, as follows: 184 categorical variables; 54 numeric, among which 50 resulted from automatic calculations; 30 variables structured in date format and 23 variables that allowed the use of free text.Additional details regarding the database structure are presented in Supplementary Material (Table S1).
The main REDCap functionalities used in the study were longitudinal events, offline data collection through the REDCap Mobile App, user access control, creation of multicenter groups, the calendar for scheduling patient evaluations and the electronic consent form framework.An extensive list of REDCap features used in this project can be found in Supplementary Material (Table S2).
The research team was trained according to the attributes of each user profile.The contents taught during the training sessions included: overview of the study scope, electronic data collection, monitoring the study workflow, correcting data inconsistencies, and monitoring data quality.These training sessions were conducted by videoconferences associated with the sending of tutorials.

Data collection
Data entry personnel included research coordinators, nurses, physicians, and physicians-in-training (residents) who voluntarily agreed to participate as academic partners.All data, except the PRO measures, were abstracted manually from the patient's electronic medical records to the REDCap database.Only in the coordinating center, surgical data were obtained using the REDCap Mobile App in the operating room by the physicians-in-training; after the procedure, however, surgical data was also registered in the patient's chart.
REDCap surveys were used to capture self-reported PRO measures at follow-up time points.The survey link was sent to the participants via email, SMS, or WhatsApp message, depending on the participant's preference.

Monitoring data quality
The study coordinating center monitored information from the prospective multicenter CIED registry using a systematic approach based on a weekly basis to ensure data quality.This monitoring included creating data reports and specific data quality rules, being applied to all datasets.For this step, the main data quality metrics which could be evaluated using REDCap's features were chosen, including completeness, which refers to the amount of complete data; accuracy, which refers to the absence of inconsistencies; and temporal plausibility, which comprises the amount of data inserted within the times pre-established by the study [17] .

Direct data audit
Direct data auditing was carried out to assess the veracity of the information entered in the database, being applied to a random sample of 10 % of the records of the study coordinating center, selected through the R Studio software.Data from the prospective multicenter CIED Registry were compared with the original source documents using forms developed in REDCap to classify each of the audited variables into the following options: (1) the data corresponds to the source document; (2) the data does not correspond to the source document; (3) the data does not exist in the source document; and (4) the data was not registered in the database, in accordance with international recommendations [18] .

Studied variables
Data quality monitoring was applied to all database variables, including demographic data (age, sex and race), clinical data (baseline heart disease, functional class of heart failure and associated comorbidities), surgical data (type of procedure, type of CIED, intraoperative complications), hospitalization data (need for daily stays in the intensive care unit, hospital complications and final outcome), data from the 30 and 180-day clinical follow-up (postoperative complications, readmissions and reoperations in the period), the standardized instruments of PRO measures and the study closure.

Statistical analysis
The quality metrics of each of the studied variables were analyzed descriptively, calculating absolute and relative frequencies, as well as central tendency and dispersion measures.The overall rate of each of the data quality metrics was calculated along with their respective confidence intervals (CI) to complement the descriptive analysis.
The Kappa coefficient (k) was used for categorical variables and the intraclass correlation coefficient (ICC) for continuous variables to estimate the agreement between the data sample from the Registry that were collected before and after performing the data audit stage.The significance level used for the tests was 5 %.

Participants of the CIED registry
During the study period, a total of 2631 consecutive patients were included in the CIED Registry and were followed up for a period of 6.1 ± 1.3 months.Thus, the data quality metrics presented in the next section refer to a total of 2631 records.

Data quality metrics
The overall rate of the studied data quality metrics showed satisfactory results with averages of 99.9, 99.8, and 96.3 % for completeness, accuracy, and temporal plausibility, respectively.The completeness and accuracy analysis showed similar behavior for the different datasets evaluated.Regarding completeness, the demographic, clinical, surgical and outcome variables reported by patients who presented 100 % complete data stood out ( Table 1 ).Data quality metrics were also evaluated according to the research site, and the results demonstrated the same pattern observed in global rates as presented in Supplementary Material (Table S3).The temporal plausibility rate was lower in the variables related to the study closure (90.8 %) and the clinical follow-up of 180 days (94.5 %).On the other hand, the data quality metrics related to the variables of the standardized PRO questionnaires had the best performance (98.8-99.9%), with practically all cases inserted within the times pre-established by the study ( Table 2 ).

Direct data auditing
A direct data audit was performed on 200 (10.1 %) random records from the study coordinating center.A total of 125 (0.9 %) errors were found among the 13,800 audited variables.The agreement rate between the records and the information present in the source documents ranged from 97.8 to 99.6 %, with data related to the surgical procedure presenting the best rates ( Table 3 ).
The agreement analysis between the data from the registry that were collected before and after the direct data audit stage showed that both categorical and numerical variables showed excellent levels of agreement ( k > 0.90 and ICC = 1.00) ( Table 4 ).The mean Kappa coefficient was 0.939 ± 0.07, ranging from 0.660 to 1.000.The variables with the lowest reliability coefficients were presence of endocarditis prior to the surgical procedure ( k = 0.660, 95 % CI 0.301 -1.000, P < 0.001), heart failure functional class ( k = 0.818, 95 % CI 0.746 -0.889, P < 0.001) and reason for the initial CIED implant ( k = 0.833, 95 % CI 0.735 -0.930, P < 0.001).Both age and length of hospital stay correlated perfectly with data from source documents (ICC = 1.000,P < 0.001).

Study implications
The importance of multicentric prospective studies with real world data is already well established in the literature and its application has been increasingly widespread in different clinical settings [19] .However, the benefits of this type of study are intrinsically related to the availability of a database infrastructure combined with continuous monitoring of the quality of information [20] .Although these strategies are essential, they are often placed in the background, compromising the reliability and reproducibility of the results of scientific research [ 21,22 ].Thus, the present study aimed to evaluate the impact of adopting strategies aimed at monitoring the data quality applied to a prospective multicenter registry of cardiac devices.
The CIED Registry covered all types of cardiac devices, as well as any category of surgical procedure related to cardiac pacing.In addition to care quality indicators, this record included an evaluation of PROs and the patient's experience with hospital care received during hospitalization through standardized questionnaires.

Table 4
Analysis of agreement between data collected in the Multicenter Prospective CIED Registry and the results of the direct data audit.the categorical and numerical variables showed excellent agreement levels between the data from the CIED Registry and the source documents, and being higher than the values reported in similar studies [14 , 15] .
Despite being a multicenter study, the coordinating center was responsible for 75.3 % of the total sample, probably because it is a reference center for cardiovascular care and because it receives patients referred from other institutions to perform more complex surgical procedures.In addition, as expected, the COVID-19 pandemic imposed some difficulties on conducting the study.Above all, the sites had a considerable delay in sending the documentation to their respective Research Ethics Committees, and there was temporary suspension of elective surgical procedures in some cases.Thus, the main limitation regarding the study sample was not the total number of registered cases, but rather the impossibility of guaranteeing that all patients operated on in the co-participating sites were included in the CIED Registry.
Although a considerable effort was devoted to performing the direct data audit, it was not possible to apply this step to the sites, mainly due to the Brazilian General Data Protection Law (LGPD) implying specific restrictions for access to the patient's medical record by professionals not directly related to care activities [28] .
Our findings suggest that the methodological approach applied to the CIED Registry resulted in high data completeness, accuracy, temporal plausibility, and excellent agreement levels with the source documents.Thus, it is feasible to suggest that data from the prospective multicenter CIED Registry meet the minimum data quality requirements, constituting a reliable, robust, and valid source of information.In addition, the data management infrastructure adopted in this study can be transposable to other medical specialties, improving the overall quality and reliability of patient registries.

Table 1
Data completeness and accuracy rate of the multicenter prospective CIED registry.

Table 2
Temporal plausibility rate of data from the multicenter prospective CIED registry.

Table 3
Findings from the direct data audit performed on a sample of 200 cases from the Multicenter Prospective CIED Registry.